Overview

Dataset statistics

Number of variables24
Number of observations4741
Missing cells0
Missing cells (%)0.0%
Duplicate rows34
Duplicate rows (%)0.7%
Total size in memory5.1 MiB
Average record size in memory1.1 KiB

Variable types

Numeric6
Categorical4
Boolean14

Alerts

Dataset has 34 (0.7%) duplicate rowsDuplicates
T3 is highly overall correlated with TT4 and 4 other fieldsHigh correlation
TT4 is highly overall correlated with T3 and 2 other fieldsHigh correlation
FTI is highly overall correlated with T3 and 2 other fieldsHigh correlation
on_thyroxine is highly overall correlated with major_classHigh correlation
pregnant is highly overall correlated with T4UHigh correlation
T4U is highly overall correlated with pregnant and 2 other fieldsHigh correlation
referral_source is highly overall correlated with T3 and 1 other fieldsHigh correlation
Class is highly overall correlated with T3 and 3 other fieldsHigh correlation
major_class is highly overall correlated with on_thyroxine and 1 other fieldsHigh correlation
query_on_thyroxine is highly imbalanced (89.7%)Imbalance
on_antithyroid_medication is highly imbalanced (91.1%)Imbalance
sick is highly imbalanced (75.5%)Imbalance
pregnant is highly imbalanced (86.0%)Imbalance
thyroid_surgery is highly imbalanced (90.3%)Imbalance
I131_treatment is highly imbalanced (88.7%)Imbalance
query_hypothyroid is highly imbalanced (63.5%)Imbalance
query_hyperthyroid is highly imbalanced (64.3%)Imbalance
lithium is highly imbalanced (96.1%)Imbalance
goitre is highly imbalanced (93.3%)Imbalance
tumor is highly imbalanced (81.7%)Imbalance
hypopituitary is highly imbalanced (99.5%)Imbalance
psych is highly imbalanced (74.2%)Imbalance
Class is highly imbalanced (65.2%)Imbalance
major_class is highly imbalanced (58.8%)Imbalance

Reproduction

Analysis started2023-04-27 12:12:18.853881
Analysis finished2023-04-27 12:12:34.381196
Duration15.53 seconds
Software versionydata-profiling vv4.1.2
Download configurationconfig.json

Variables

age
Real number (ℝ)

Distinct93
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean52.00464
Minimum1
Maximum455
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size74.1 KiB
2023-04-27T17:42:34.753940image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile20
Q136
median55
Q368
95-th percentile80
Maximum455
Range454
Interquartile range (IQR)32

Descriptive statistics

Standard deviation20.03123
Coefficient of variation (CV)0.3851816
Kurtosis33.384602
Mean52.00464
Median Absolute Deviation (MAD)15
Skewness1.5016181
Sum246554
Variance401.25019
MonotonicityNot monotonic
2023-04-27T17:42:35.061807image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
59 127
 
2.7%
60 123
 
2.6%
70 113
 
2.4%
55 106
 
2.2%
73 106
 
2.2%
63 103
 
2.2%
72 102
 
2.2%
34 97
 
2.0%
62 96
 
2.0%
65 94
 
2.0%
Other values (83) 3674
77.5%
ValueCountFrequency (%)
1 9
0.2%
2 7
0.1%
4 2
 
< 0.1%
5 2
 
< 0.1%
6 1
 
< 0.1%
7 9
0.2%
8 3
 
0.1%
10 2
 
< 0.1%
11 5
0.1%
12 7
0.1%
ValueCountFrequency (%)
455 1
 
< 0.1%
94 2
 
< 0.1%
93 4
 
0.1%
92 2
 
< 0.1%
91 2
 
< 0.1%
90 5
 
0.1%
89 9
0.2%
88 11
0.2%
87 18
0.4%
86 8
0.2%

sex
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size305.6 KiB
F
3370 
M
1371 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4741
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowF
2nd rowF
3rd rowM
4th rowF
5th rowF

Common Values

ValueCountFrequency (%)
F 3370
71.1%
M 1371
28.9%

Length

2023-04-27T17:42:35.300761image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-04-27T17:42:35.550764image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
f 3370
71.1%
m 1371
28.9%

Most occurring characters

ValueCountFrequency (%)
F 3370
71.1%
M 1371
28.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 4741
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F 3370
71.1%
M 1371
28.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 4741
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
F 3370
71.1%
M 1371
28.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4741
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F 3370
71.1%
M 1371
28.9%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.7 KiB
False
4119 
True
622 
ValueCountFrequency (%)
False 4119
86.9%
True 622
 
13.1%
2023-04-27T17:42:35.740273image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.7 KiB
False
4677 
True
 
64
ValueCountFrequency (%)
False 4677
98.7%
True 64
 
1.3%
2023-04-27T17:42:35.943395image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.7 KiB
False
4688 
True
 
53
ValueCountFrequency (%)
False 4688
98.9%
True 53
 
1.1%
2023-04-27T17:42:36.131826image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

sick
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.7 KiB
False
4549 
True
 
192
ValueCountFrequency (%)
False 4549
96.0%
True 192
 
4.0%
2023-04-27T17:42:36.319322image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

pregnant
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.7 KiB
False
4647 
True
 
94
ValueCountFrequency (%)
False 4647
98.0%
True 94
 
2.0%
2023-04-27T17:42:36.506827image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.7 KiB
False
4682 
True
 
59
ValueCountFrequency (%)
False 4682
98.8%
True 59
 
1.2%
2023-04-27T17:42:36.681794image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.7 KiB
False
4669 
True
 
72
ValueCountFrequency (%)
False 4669
98.5%
True 72
 
1.5%
2023-04-27T17:42:36.877987image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.7 KiB
False
4410 
True
 
331
ValueCountFrequency (%)
False 4410
93.0%
True 331
 
7.0%
2023-04-27T17:42:37.071310image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.7 KiB
False
4421 
True
 
320
ValueCountFrequency (%)
False 4421
93.3%
True 320
 
6.7%
2023-04-27T17:42:37.260904image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

lithium
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.7 KiB
False
4721 
True
 
20
ValueCountFrequency (%)
False 4721
99.6%
True 20
 
0.4%
2023-04-27T17:42:37.448396image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

goitre
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.7 KiB
False
4703 
True
 
38
ValueCountFrequency (%)
False 4703
99.2%
True 38
 
0.8%
2023-04-27T17:42:37.635893image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

tumor
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.7 KiB
False
4609 
True
 
132
ValueCountFrequency (%)
False 4609
97.2%
True 132
 
2.8%
2023-04-27T17:42:37.825369image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.7 KiB
False
4739 
True
 
2
ValueCountFrequency (%)
False 4739
> 99.9%
True 2
 
< 0.1%
2023-04-27T17:42:38.012875image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

psych
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.7 KiB
False
4535 
True
 
206
ValueCountFrequency (%)
False 4535
95.7%
True 206
 
4.3%
2023-04-27T17:42:38.215999image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

TSH
Real number (ℝ)

Distinct287
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.6655368
Minimum0.005
Maximum530
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size74.1 KiB
2023-04-27T17:42:38.421173image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0.005
5-th percentile0.02
Q10.5
median1.4
Q33
95-th percentile23
Maximum530
Range529.995
Interquartile range (IQR)2.5

Descriptive statistics

Standard deviation29.266519
Coefficient of variation (CV)4.3907219
Kurtosis165.05369
Mean6.6655368
Median Absolute Deviation (MAD)1.1
Skewness11.567816
Sum31601.31
Variance856.52911
MonotonicityNot monotonic
2023-04-27T17:42:38.735449image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.4 494
 
10.4%
0.2 150
 
3.2%
1.3 119
 
2.5%
1.1 104
 
2.2%
1.5 89
 
1.9%
0.02 86
 
1.8%
0.1 86
 
1.8%
0.25 85
 
1.8%
1.9 85
 
1.8%
1.2 85
 
1.8%
Other values (277) 3358
70.8%
ValueCountFrequency (%)
0.005 74
1.6%
0.01 39
0.8%
0.015 45
0.9%
0.02 86
1.8%
0.025 25
 
0.5%
0.03 38
0.8%
0.035 26
 
0.5%
0.04 25
 
0.5%
0.045 16
 
0.3%
0.05 66
1.4%
ValueCountFrequency (%)
530 2
< 0.1%
478 2
< 0.1%
472 2
< 0.1%
468 2
< 0.1%
440 2
< 0.1%
400 2
< 0.1%
236 2
< 0.1%
230 2
< 0.1%
199 2
< 0.1%
188 2
< 0.1%

T3
Real number (ℝ)

Distinct69
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.9987007
Minimum0.05
Maximum10.6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size74.1 KiB
2023-04-27T17:42:39.031469image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0.05
5-th percentile0.7
Q11.6
median2
Q32.3
95-th percentile3.5
Maximum10.6
Range10.55
Interquartile range (IQR)0.7

Descriptive statistics

Standard deviation0.87016604
Coefficient of variation (CV)0.43536586
Kurtosis10.080257
Mean1.9987007
Median Absolute Deviation (MAD)0.3
Skewness1.8163417
Sum9475.84
Variance0.75718894
MonotonicityNot monotonic
2023-04-27T17:42:39.345999image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2 1172
24.7%
2.2 233
 
4.9%
1.8 229
 
4.8%
2.1 214
 
4.5%
1.9 208
 
4.4%
2.3 205
 
4.3%
1.7 180
 
3.8%
1.6 177
 
3.7%
1.5 170
 
3.6%
2.5 143
 
3.0%
Other values (59) 1810
38.2%
ValueCountFrequency (%)
0.05 4
 
0.1%
0.1 3
 
0.1%
0.2 37
0.8%
0.3 47
1.0%
0.4 42
0.9%
0.5 34
0.7%
0.6 38
0.8%
0.7 67
1.4%
0.8 80
1.7%
0.9 80
1.7%
ValueCountFrequency (%)
10.6 2
< 0.1%
8.5 2
< 0.1%
7.6 2
< 0.1%
7.3 2
< 0.1%
7.1 4
0.1%
7 2
< 0.1%
6.7 2
< 0.1%
6.6 2
< 0.1%
6.2 2
< 0.1%
6.1 2
< 0.1%

TT4
Real number (ℝ)

Distinct241
Distinct (%)5.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean109.39097
Minimum2
Maximum430
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size74.1 KiB
2023-04-27T17:42:39.658491image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile56
Q187
median104
Q3127
95-th percentile181
Maximum430
Range428
Interquartile range (IQR)40

Descriptive statistics

Standard deviation39.89298
Coefficient of variation (CV)0.36468257
Kurtosis5.6874509
Mean109.39097
Median Absolute Deviation (MAD)19
Skewness1.2193711
Sum518622.6
Variance1591.4499
MonotonicityNot monotonic
2023-04-27T17:42:39.941814image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
104 289
 
6.1%
101 85
 
1.8%
87 74
 
1.6%
98 72
 
1.5%
93 70
 
1.5%
103 70
 
1.5%
99 66
 
1.4%
102 66
 
1.4%
91 66
 
1.4%
94 64
 
1.3%
Other values (231) 3819
80.6%
ValueCountFrequency (%)
2 2
 
< 0.1%
2.9 2
 
< 0.1%
3 4
 
0.1%
4 2
 
< 0.1%
4.8 2
 
< 0.1%
5.8 4
 
0.1%
6 2
 
< 0.1%
9.5 2
 
< 0.1%
10 10
0.2%
11 4
 
0.1%
ValueCountFrequency (%)
430 4
0.1%
372 2
< 0.1%
301 2
< 0.1%
289 2
< 0.1%
273 2
< 0.1%
272 2
< 0.1%
263 2
< 0.1%
261 2
< 0.1%
258 2
< 0.1%
257 2
< 0.1%

T4U
Real number (ℝ)

Distinct146
Distinct (%)3.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.99842945
Minimum0.25
Maximum2.32
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size74.1 KiB
2023-04-27T17:42:40.254312image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0.25
5-th percentile0.73
Q10.88
median0.97
Q31.08
95-th percentile1.38
Maximum2.32
Range2.07
Interquartile range (IQR)0.2

Descriptive statistics

Standard deviation0.20603477
Coefficient of variation (CV)0.20635886
Kurtosis4.1864897
Mean0.99842945
Median Absolute Deviation (MAD)0.1
Skewness1.3540652
Sum4733.554
Variance0.042450325
MonotonicityNot monotonic
2023-04-27T17:42:40.521932image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.97 543
 
11.5%
0.9 119
 
2.5%
1.01 113
 
2.4%
0.99 112
 
2.4%
1 107
 
2.3%
0.89 106
 
2.2%
0.92 105
 
2.2%
1.02 105
 
2.2%
0.86 103
 
2.2%
0.93 101
 
2.1%
Other values (136) 3227
68.1%
ValueCountFrequency (%)
0.25 1
 
< 0.1%
0.31 1
 
< 0.1%
0.36 2
< 0.1%
0.38 2
< 0.1%
0.41 2
< 0.1%
0.46 3
0.1%
0.47 1
 
< 0.1%
0.48 3
0.1%
0.49 2
< 0.1%
0.5 4
0.1%
ValueCountFrequency (%)
2.32 2
< 0.1%
2.12 1
 
< 0.1%
2.03 2
< 0.1%
2.01 2
< 0.1%
1.97 2
< 0.1%
1.94 2
< 0.1%
1.93 2
< 0.1%
1.88 4
0.1%
1.84 2
< 0.1%
1.83 4
0.1%

FTI
Real number (ℝ)

Distinct234
Distinct (%)4.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean111.09816
Minimum2
Maximum395
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size74.1 KiB
2023-04-27T17:42:40.803199image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile63
Q193
median107
Q3124
95-th percentile175
Maximum395
Range393
Interquartile range (IQR)31

Descriptive statistics

Standard deviation36.56323
Coefficient of variation (CV)0.32910741
Kurtosis7.022334
Mean111.09816
Median Absolute Deviation (MAD)15
Skewness1.3459915
Sum526716.4
Variance1336.8698
MonotonicityNot monotonic
2023-04-27T17:42:41.117675image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
107 506
 
10.7%
100 85
 
1.8%
93 84
 
1.8%
104 76
 
1.6%
114 76
 
1.6%
98 75
 
1.6%
96 72
 
1.5%
102 72
 
1.5%
97 72
 
1.5%
92 72
 
1.5%
Other values (224) 3551
74.9%
ValueCountFrequency (%)
2 2
< 0.1%
2.8 2
< 0.1%
3 4
0.1%
4 2
< 0.1%
5.4 2
< 0.1%
7 2
< 0.1%
7.6 2
< 0.1%
8.4 2
< 0.1%
8.5 2
< 0.1%
8.9 2
< 0.1%
ValueCountFrequency (%)
395 4
0.1%
362 2
< 0.1%
349 2
< 0.1%
312 2
< 0.1%
291 2
< 0.1%
283 2
< 0.1%
281 2
< 0.1%
280 2
< 0.1%
274 2
< 0.1%
265 1
 
< 0.1%

referral_source
Categorical

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size320.8 KiB
other
2703 
SVI
1384 
SVHC
427 
STMW
 
176
SVHD
 
51

Length

Max length5
Median length5
Mean length4.2782113
Min length3

Characters and Unicode

Total characters20283
Distinct characters14
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSVHC
2nd rowother
3rd rowother
4th rowother
5th rowSVI

Common Values

ValueCountFrequency (%)
other 2703
57.0%
SVI 1384
29.2%
SVHC 427
 
9.0%
STMW 176
 
3.7%
SVHD 51
 
1.1%

Length

2023-04-27T17:42:41.431176image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-04-27T17:42:41.712420image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
other 2703
57.0%
svi 1384
29.2%
svhc 427
 
9.0%
stmw 176
 
3.7%
svhd 51
 
1.1%

Most occurring characters

ValueCountFrequency (%)
o 2703
13.3%
t 2703
13.3%
h 2703
13.3%
e 2703
13.3%
r 2703
13.3%
S 2038
10.0%
V 1862
9.2%
I 1384
6.8%
H 478
 
2.4%
C 427
 
2.1%
Other values (4) 579
 
2.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 13515
66.6%
Uppercase Letter 6768
33.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 2038
30.1%
V 1862
27.5%
I 1384
20.4%
H 478
 
7.1%
C 427
 
6.3%
T 176
 
2.6%
M 176
 
2.6%
W 176
 
2.6%
D 51
 
0.8%
Lowercase Letter
ValueCountFrequency (%)
o 2703
20.0%
t 2703
20.0%
h 2703
20.0%
e 2703
20.0%
r 2703
20.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 20283
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 2703
13.3%
t 2703
13.3%
h 2703
13.3%
e 2703
13.3%
r 2703
13.3%
S 2038
10.0%
V 1862
9.2%
I 1384
6.8%
H 478
 
2.4%
C 427
 
2.1%
Other values (4) 579
 
2.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 20283
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 2703
13.3%
t 2703
13.3%
h 2703
13.3%
e 2703
13.3%
r 2703
13.3%
S 2038
10.0%
V 1862
9.2%
I 1384
6.8%
H 478
 
2.4%
C 427
 
2.1%
Other values (4) 579
 
2.9%

Class
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct15
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size345.1 KiB
negative
3772 
sick
 
231
compensated hypothyroid
 
194
increased binding protein
 
149
primary hypothyroid
 
95
Other values (10)
 
300

Length

Max length25
Median length8
Mean length9.5408142
Min length4

Characters and Unicode

Total characters45233
Distinct characters23
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rownegative
2nd rownegative
3rd rownegative
4th rownegative
5th rownegative

Common Values

ValueCountFrequency (%)
negative 3772
79.6%
sick 231
 
4.9%
compensated hypothyroid 194
 
4.1%
increased binding protein 149
 
3.1%
primary hypothyroid 95
 
2.0%
hyperthyroid 79
 
1.7%
discordant 58
 
1.2%
underreplacement 52
 
1.1%
replacement therapy 38
 
0.8%
overreplacement 34
 
0.7%
Other values (5) 39
 
0.8%

Length

2023-04-27T17:42:41.964420image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
negative 3772
69.8%
hypothyroid 291
 
5.4%
sick 231
 
4.3%
compensated 194
 
3.6%
binding 163
 
3.0%
protein 163
 
3.0%
increased 149
 
2.8%
primary 95
 
1.8%
hyperthyroid 79
 
1.5%
discordant 58
 
1.1%
Other values (9) 212
 
3.9%

Most occurring characters

ValueCountFrequency (%)
e 9025
20.0%
i 5187
11.5%
n 4841
10.7%
t 4742
10.5%
a 4447
9.8%
g 3947
8.7%
v 3806
8.4%
r 1286
 
2.8%
o 1136
 
2.5%
d 1075
 
2.4%
Other values (13) 5741
12.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 44557
98.5%
Space Separator 666
 
1.5%
Uppercase Letter 10
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 9025
20.3%
i 5187
11.6%
n 4841
10.9%
t 4742
10.6%
a 4447
10.0%
g 3947
8.9%
v 3806
8.5%
r 1286
 
2.9%
o 1136
 
2.5%
d 1075
 
2.4%
Other values (11) 5065
11.4%
Space Separator
ValueCountFrequency (%)
666
100.0%
Uppercase Letter
ValueCountFrequency (%)
T 10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 44567
98.5%
Common 666
 
1.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 9025
20.3%
i 5187
11.6%
n 4841
10.9%
t 4742
10.6%
a 4447
10.0%
g 3947
8.9%
v 3806
8.5%
r 1286
 
2.9%
o 1136
 
2.5%
d 1075
 
2.4%
Other values (12) 5075
11.4%
Common
ValueCountFrequency (%)
666
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 45233
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 9025
20.0%
i 5187
11.5%
n 4841
10.7%
t 4742
10.5%
a 4447
9.8%
g 3947
8.7%
v 3806
8.4%
r 1286
 
2.8%
o 1136
 
2.5%
d 1075
 
2.4%
Other values (13) 5741
12.7%

major_class
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct8
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size340.8 KiB
negative
3772 
hypothyroid
 
291
sick
 
231
binding protein
 
163
replacement therapy
 
124
Other values (3)
 
160

Length

Max length19
Median length8
Mean length8.6129509
Min length4

Characters and Unicode

Total characters40834
Distinct characters20
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rownegative
2nd rownegative
3rd rownegative
4th rownegative
5th rownegative

Common Values

ValueCountFrequency (%)
negative 3772
79.6%
hypothyroid 291
 
6.1%
sick 231
 
4.9%
binding protein 163
 
3.4%
replacement therapy 124
 
2.6%
hyperthyroid 90
 
1.9%
discordant 58
 
1.2%
goitre 12
 
0.3%

Length

2023-04-27T17:42:42.214415image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-04-27T17:42:42.513305image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
negative 3772
75.0%
hypothyroid 291
 
5.8%
sick 231
 
4.6%
binding 163
 
3.2%
protein 163
 
3.2%
replacement 124
 
2.5%
therapy 124
 
2.5%
hyperthyroid 90
 
1.8%
discordant 58
 
1.2%
goitre 12
 
0.2%

Most occurring characters

ValueCountFrequency (%)
e 8305
20.3%
i 4943
12.1%
t 4634
11.3%
n 4443
10.9%
a 4078
10.0%
g 3947
9.7%
v 3772
9.2%
r 952
 
2.3%
o 905
 
2.2%
y 886
 
2.2%
Other values (10) 3969
9.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 40547
99.3%
Space Separator 287
 
0.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 8305
20.5%
i 4943
12.2%
t 4634
11.4%
n 4443
11.0%
a 4078
10.1%
g 3947
9.7%
v 3772
9.3%
r 952
 
2.3%
o 905
 
2.2%
y 886
 
2.2%
Other values (9) 3682
9.1%
Space Separator
ValueCountFrequency (%)
287
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 40547
99.3%
Common 287
 
0.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 8305
20.5%
i 4943
12.2%
t 4634
11.4%
n 4443
11.0%
a 4078
10.1%
g 3947
9.7%
v 3772
9.3%
r 952
 
2.3%
o 905
 
2.2%
y 886
 
2.2%
Other values (9) 3682
9.1%
Common
ValueCountFrequency (%)
287
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 40834
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 8305
20.3%
i 4943
12.1%
t 4634
11.3%
n 4443
10.9%
a 4078
10.0%
g 3947
9.7%
v 3772
9.2%
r 952
 
2.3%
o 905
 
2.2%
y 886
 
2.2%
Other values (10) 3969
9.7%

Correlations

2023-04-27T17:42:42.950804image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ageTSHT3TT4T4UFTI
age1.000-0.063-0.239-0.060-0.1960.045
TSH-0.0631.000-0.161-0.2880.071-0.317
T3-0.239-0.1611.0000.5530.4360.350
TT4-0.060-0.2880.5531.0000.4100.798
T4U-0.1960.0710.4360.4101.000-0.164
FTI0.045-0.3170.3500.798-0.1641.000
2023-04-27T17:42:43.234057image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ageTSHT3TT4T4UFTI
age1.000-0.010-0.284-0.055-0.1810.071
TSH-0.0101.000-0.258-0.4140.058-0.447
T3-0.284-0.2581.0000.4390.4260.136
TT4-0.055-0.4140.4391.0000.3970.728
T4U-0.1810.0580.4260.3971.000-0.190
FTI0.071-0.4470.1360.728-0.1901.000
2023-04-27T17:42:43.530852image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ageTSHT3TT4T4UFTI
age1.000-0.007-0.200-0.037-0.1230.048
TSH-0.0071.000-0.185-0.2950.040-0.322
T3-0.200-0.1851.0000.3200.3150.095
TT4-0.037-0.2950.3201.0000.2870.566
T4U-0.1230.0400.3150.2871.000-0.131
FTI0.048-0.3220.0950.566-0.1311.000
2023-04-27T17:42:43.876872image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
agesexon_thyroxinequery_on_thyroxineon_antithyroid_medicationsickpregnantthyroid_surgeryI131_treatmentquery_hypothyroidquery_hyperthyroidlithiumgoitretumorhypopituitarypsychTSHT3TT4T4UFTIreferral_sourceClassmajor_class
age1.0000.0590.0420.0000.0820.1960.2530.0310.0760.0750.0660.0380.0820.0400.0100.1640.0000.2020.0610.1960.0490.2410.1460.190
sex0.0591.0000.1280.0450.0250.0120.1380.0370.0120.0370.1020.0000.0000.1070.0240.1590.0000.1490.1670.2900.1300.1860.1270.157
on_thyroxine0.0420.1281.0000.0000.0000.0590.0000.0460.0940.1480.0290.0000.0000.0410.0000.1110.0730.1100.2190.0710.2900.1920.4670.565
query_on_thyroxine0.0000.0450.0001.0000.0000.0000.0840.0000.0000.0380.0000.0000.0600.0000.2030.0230.0000.0250.0000.0950.0470.0330.0000.000
on_antithyroid_medication0.0820.0250.0000.0001.0000.0130.1210.0000.0000.0150.2100.0000.0000.0000.0000.0160.0000.1640.0810.1350.0000.0830.1760.054
sick0.1960.0120.0590.0000.0131.0000.0330.0000.0240.0550.0520.0000.0000.0000.0000.0000.0000.1360.0750.0880.0750.2350.0850.102
pregnant0.2530.1380.0000.0840.1210.0331.0000.0000.0000.0300.1980.0000.0000.2120.0000.0000.0000.3680.2160.6540.1150.2750.3340.394
thyroid_surgery0.0310.0370.0460.0000.0000.0000.0001.0000.0000.0000.0000.0000.0000.0000.0000.0200.0390.0500.0000.0000.0150.0330.0160.023
I131_treatment0.0760.0120.0940.0000.0000.0240.0000.0001.0000.0760.1010.0000.0000.0100.0000.0260.0000.0000.0000.0540.0000.0850.0000.000
query_hypothyroid0.0750.0370.1480.0380.0150.0550.0300.0000.0761.0000.0000.0000.0220.0480.0000.0000.0460.1280.0970.0000.1200.0220.1120.119
query_hyperthyroid0.0660.1020.0290.0000.2100.0520.1980.0000.1010.0001.0000.0370.0200.0820.0000.0840.0590.3320.1870.0990.2710.1070.1770.205
lithium0.0380.0000.0000.0000.0000.0000.0000.0000.0000.0000.0371.0000.0000.0000.0000.0340.0000.0000.0000.0000.0000.1210.0000.000
goitre0.0820.0000.0000.0600.0000.0000.0000.0000.0000.0220.0200.0001.0000.0000.0000.0000.0000.0000.0000.0700.0700.0050.0000.000
tumor0.0400.1070.0410.0000.0000.0000.2120.0000.0100.0480.0820.0000.0001.0000.0000.0220.0000.1450.0530.1710.0580.0810.3260.397
hypopituitary0.0100.0240.0000.2030.0000.0000.0000.0000.0000.0000.0000.0000.0000.0001.0000.0000.0000.0000.0000.0000.0540.0110.0000.027
psych0.1640.1590.1110.0230.0160.0000.0000.0200.0260.0000.0840.0340.0000.0220.0001.0000.0000.0910.0430.0410.0820.4620.0470.067
TSH0.0000.0000.0730.0000.0000.0000.0000.0390.0000.0460.0590.0000.0000.0000.0000.0001.0000.1600.4170.1130.4860.0490.3510.251
T30.2020.1490.1100.0250.1640.1360.3680.0500.0000.1280.3320.0000.0000.1450.0000.0910.1601.0000.5540.5670.8380.5250.5680.493
TT40.0610.1670.2190.0000.0810.0750.2160.0000.0000.0970.1870.0000.0000.0530.0000.0430.4170.5541.0000.4650.8170.3070.5790.400
T4U0.1960.2900.0710.0950.1350.0880.6540.0000.0540.0000.0990.0000.0700.1710.0000.0410.1130.5670.4651.0000.2650.5270.4990.374
FTI0.0490.1300.2900.0470.0000.0750.1150.0150.0000.1200.2710.0000.0700.0580.0540.0820.4860.8380.8170.2651.0000.2340.5980.455
referral_source0.2410.1860.1920.0330.0830.2350.2750.0330.0850.0220.1070.1210.0050.0810.0110.4620.0490.5250.3070.5270.2341.0000.4270.299
Class0.1460.1270.4670.0000.1760.0850.3340.0160.0000.1120.1770.0000.0000.3260.0000.0470.3510.5680.5790.4990.5980.4271.0001.000
major_class0.1900.1570.5650.0000.0540.1020.3940.0230.0000.1190.2050.0000.0000.3970.0270.0670.2510.4930.4000.3740.4550.2991.0001.000

Missing values

2023-04-27T17:42:32.328048image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-04-27T17:42:33.816375image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

agesexon_thyroxinequery_on_thyroxineon_antithyroid_medicationsickpregnantthyroid_surgeryI131_treatmentquery_hypothyroidquery_hyperthyroidlithiumgoitretumorhypopituitarypsychTSHT3TT4T4UFTIreferral_sourceClassmajor_class
041.0Fffffffffffffff1.302.5125.01.14109.0SVHCnegativenegative
123.0Fffffffffffffff4.102.0102.00.97107.0othernegativenegative
246.0Mffffffffffffff0.982.0109.00.91120.0othernegativenegative
370.0Ftfffffffffffff0.161.9175.00.97107.0othernegativenegative
470.0Fffffffffffffff0.721.261.00.8770.0SVInegativenegative
518.0Ftfffffffffffff0.032.0183.01.30141.0othernegativenegative
659.0Fffffffffffffff1.402.072.00.9278.0othernegativenegative
780.0Fffffffffffffff2.200.680.00.70115.0SVInegativenegative
866.0Fffffffffffftff0.602.2123.00.93132.0SVInegativenegative
968.0Mffffffffffffff2.401.683.00.8993.0SVInegativenegative
agesexon_thyroxinequery_on_thyroxineon_antithyroid_medicationsickpregnantthyroid_surgeryI131_treatmentquery_hypothyroidquery_hyperthyroidlithiumgoitretumorhypopituitarypsychTSHT3TT4T4UFTIreferral_sourceClassmajor_class
82676.0Mtfffffftffffff6.601.091.01.0488.0SVIsicksick
83571.0Fffffffffffffff1.600.373.00.9081.0othersicksick
84229.0Fffftffffffffff0.200.954.00.7672.0SVIsicksick
86570.0Fffffffftffffff1.701.0112.01.02110.0SVIsicksick
88424.0Ftfffffffffffff66.001.0112.01.01110.0othersicksick
92554.0Fffffffffffffff4.100.877.00.8293.0SVIsicksick
93055.0Mffffffffffffff0.601.195.00.9798.0SVIsicksick
95035.0Mffffffffffffff0.250.438.01.0835.0SVIsicksick
95178.0Fffffffffffffff6.301.156.00.8665.0SVIsicksick
95581.0Fffffffffffffff1.401.092.00.9993.0othersicksick

Duplicate rows

Most frequently occurring

agesexon_thyroxinequery_on_thyroxineon_antithyroid_medicationsickpregnantthyroid_surgeryI131_treatmentquery_hypothyroidquery_hyperthyroidlithiumgoitretumorhypopituitarypsychTSHT3TT4T4UFTIreferral_sourceClassmajor_class# duplicates
426.0Fffffffffffffff1.42.0104.00.97107.0othernegativenegative6
529.0Fffffffffffffff1.42.0104.00.97107.0othernegativenegative4
732.0Fffffffffffffff1.42.0104.00.97107.0othernegativenegative4
833.0Fffffffffffffff1.42.0104.00.97107.0othernegativenegative4
1541.0Fffffffffffffff1.42.0104.00.97107.0othernegativenegative4
1751.0Fffffffffffffff1.42.0104.00.97107.0othernegativenegative4
2257.0Fffffffffffffff1.42.0104.00.97107.0othernegativenegative4
2558.0Fffffffffffffff1.42.0104.00.97107.0othernegativenegative4
119.0Fffffffffffffff1.42.0104.00.97107.0othernegativenegative3
322.0Fffffffffffffff1.42.0104.00.97107.0othernegativenegative3
2023-04-27T17:42:48.533386 image/svg+xml Matplotlib v3.6.3, https://matplotlib.org/
2023-04-27T17:43:40.491188 image/svg+xml Matplotlib v3.6.3, https://matplotlib.org/